-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition when rendering the UI #774
Fix race condition when rendering the UI #774
Conversation
2a597a4
to
cdf6d32
Compare
f884431
to
4ea254b
Compare
Ooofff! This was quite the Pandora's box. Fix a bug, 3 new hidden/pre-existing race conditions appear. I hope I got them all (or at least the major ones). It's ready for review. It does not depend on anything, but may require additional testing. If everything goes fine, I think this fix can be (back-)ported to pretty much all branches! Btw: What's the deal with ign-gui? @iche033 said the goal was to merge these two files; but I don't know when/where is ign-gui's Scene3D.cc file is used. Considering they're almost identical, I suspect these patches have to be ported there too, but I don't know how to test it because I how to trigger ign-gui's version |
Yup, here's the issue for that: gazebosim/gz-gui#137. We're currently studying the possibility of doing it as part of #556.
The ign_rviz project makes a more interesting use of that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some minor coding style comments.
Here're a few things I tested and working:
- adding and removing shapes and lights
- moving models with transform tool and component inspector
- image display
- lidar visualization
- collisions visualization
- setting transparency
- video recording
- follow mode
I ran into a couple of situations where GUI froze
- Using the
Move To
function (right click on object and selectMove To
)- this should be fixed by adding
renderSync->ReleaseQtThreadFromBlock(lock);
before thisreturn
line: https://github.com/ignitionrobotics/ign-gazebo/blob/89b5c9b912f554ffa784d7430a400e4cb93a3e37/src/gui/plugins/scene3d/Scene3D.cc#L826
- this should be fixed by adding
- Moving the camera when no Qt GUI elements are being updated. For example, when the simulation is paused, the Real Time Factor (RTF) % number on the bottom right of the screen is static - in this case sometimes camera movement locks up. You can also right click on this Qt widget, select close, then move the camera to reproduce the issue.
- I have not found the cause of this freeze yet.
I fixed all mentioned issues except this one:
How do I repro that? Just keyboard movement? (I can't see the UI right now because it's compiling...) Update: Compiling done. Steps to repro? |
Here are the steps I did:
Here's a gif showing the issue. The first time I tried to move the camera when simulation is paused, it does not work. After hitting play, I did not have problem. I then paused the simulation again - this time the first mouse movement worked but subsequent mouse drags didn't. |
What version of Ubuntu are you on? I tried on these environments:
Update: No need to provide this info. I nailed down to the problem that This would normally be OK but due to how Qt run loop works, Qt will now think there is nothing else to update (until the worker thread sends more info, but the worker thread will get stuck in the next |
I'm tired. A whole day of debugging. Re implemented the code. Deadlocks still appear again in different forms. Impossible to repro in a standalone case. Only repros in the Intel machine (20.04), cannot repro at all in the AMD machine (18.04). By now I'm starting to question if it's a HW or library bug. A quick Google revealed there may be a glibc bug in condition variables. I've not confirmed this is our case and not just a bug in our code. I'm posting the link for reference for myself for tomorrow. |
Ooof!!! Finally. It was our own fault after all (i.e. not the alleged glibc bug). The problem is that rendering changed between Qt versions and newer versions offload some work to a worker thread that was previously in the main thread. This causes potential deadlock scenarios that would not happen (or extremely rarely) in older Qt versions. I've redone the synchronization code. It seems to work fine in both machines environments now. It should be ready for review and hopefully merge. |
This fix depends on a fix in ign-rendering module, because it depends on the new Camera::SwapFromThread function Without it, compilation will fail Affects gazebosim/gz-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
This avoids us to break ign-rendering ABI while also simplifying the amount of work to be done Serializing work is easier to maintain and debug Only CPU-bound scenario would potentially benefit from parallel command generation (in terms of UI responsiveness) Parallel command generation can be added back later Also fixed coding style Refer to gazebosim/gz-rendering#304 (comment) for discussion Affects ign-rendering#304 Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
Also fixes preexisting race condition when shutting down and improper uninitialization of the worker thread Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
Fix coding style Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
It's a good thing we went for serializing rendering. THe way Qt implements the double buffer scheme using signals & slots is fundamentally flawed because it assumes the worker thread never needs to synchronize (e.g. to invalidate FBOs if window resolution changes). Trying to synchronize can easily cause deadlocks if Qt thread has spurious updates which don't end up emitting TextureInUse, as the worker thread is running slower than Qt thread. A way to fix this could be to use a different synchronization mechanism where the main thread increases a request counter and the worker thread is constantly looping but only wakes up when that counter is > 0. For now, this will do. Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
fa34ea7
to
c236828
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use smart pointers to pass renderSync
around? Right now RenderWindow
instantiates the object and passes its pointer around to other classes. TextureNode
even keeps a pointer as a member variable. But the ownership of that pointer is not documented. I think it's very possible that someone comes some months from now and adds some delete renderSync
s where they shouldn't.
I think that passing a shared pointer should make it explicit that the receiving classes shouldn't assume ownership and can safely release it when no longer needed.
Regarding smart ptrs: Yes and no. The relationships are the following
I have a bigger concern about this (the 'no' part of my answer), which is that the |
Turn some pointers into references Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
Thanks for tracking that down! The deadlock issue is fixed for me. I'm actually on Ubuntu Bionic with Qt 5.9.5 (and Nvidia driver) but bug happened quite often. The changes look good to me, including the use of reference for RenderSync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be changed to a reference instead of pointer so that it is never deleted.
Thanks for the changes, TextureNode
holding a reference should make this more future-proof 👍
Thank you for iterating, this LGTM, although I can't reproduce the original issue.
I just spotted some missing docs.
Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
I changed it because it didn't work, but it must've been old code because it works now. Signed-off-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
OK all outstanding issues fixed. Should be (hopefully) ready for merge. |
@osrf-jenkins run tests please |
test failures look unrelated. latest changes look good to me. Merging. |
This fix depends on a fix in ign-rendering module, because itdepends on the new Camera::SwapFromThread function
Without it, compilation will fail
The PR for ign-rendering can be found at gazebosim/gz-rendering#307Forget about that PR for now
🦟 Bug fix
Fixes gazebosim/gz-rendering#304
Checklist